[RouteOrch] Fix notification pipeline latency regression in ResponsePublisher#4389
[RouteOrch] Fix notification pipeline latency regression in ResponsePublisher#4389tirupatihemanth wants to merge 3 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adjusts ResponsePublisher’s Redis notification pipeline configuration to eliminate added notification latency in the RouteOrch → fpmsyncd response/offload confirmation path.
Changes:
- Set the Redis pipeline batch size for the notification pipeline (
m_ntf_pipe) to1to force immediate flush behavior. - Keep the DB write pipeline (
m_db_pipe) at the default batch size to preserve write throughput optimizations.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
9c58a2b to
a23f000
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Hemanth Kumar Tirupati <htirupati@nvidia.com>
a23f000 to
1d14d25
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
Retrying failed(or canceled) jobs... |
|
Retrying failed(or canceled) stages in build 1071252: ✅Stage BuildAsan:
|
|
/azp run |
|
/azpw run |
|
Retrying failed(or canceled) jobs... |
|
No Azure DevOps builds found for #4389. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw retry |
|
Retrying failed(or canceled) jobs... |
|
Retrying failed(or canceled) stages in build 1074787: ✅Stage TestAsan:
✅Stage Test:
|
|
/azpw retry |
|
Retrying failed(or canceled) jobs... |
|
Retrying failed(or canceled) stages in build 1074787: ✅Stage Test:
|
|
/azpw retry |
|
Retrying failed(or canceled) jobs... |
|
No failed(or canceled) stages or jobs found in the most recent build 1074787. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Why I did it
PR #4172 changed ResponsePublisher pipelines from batch size 1 to 128, improving bulk throughput but causing route offload notifications to stay buffered until OrchDaemon::flush() (up to 1s). With suppress-fib-pending enabled, this delays the FIB offload reply to zebra, causing BGP to hold suppressed routes longer and failing test_bgp_update_timer_single_route.
What I did
Added m_publisher.flush() in RouteOrch::doTask() after the post-processing loop. This flushes all buffered notifications once per batch -- immediately after route processing, not on the 1s periodic timer. Bulk batching from PR #4172 is preserved since it's one flush per batch, not per route.
How I verified it